Introduction
Image classification is a fundamental issue in computer vision and machine learning area. With advances in graphical processing unit technology, it has been observed that image classifications with deep Neural Networks (NNs) give more accurate results compared to traditional methods [1],[2]. In deep NNs, input datasets are processed in deep convolutional layers, which can learn feature representations hierarchically, beginning from low level to more abstract representation. Especially, Convolutional NNs (CNNs) were proposed because they can preserve spatial relationships between image pixels [3]–[5].
Very deep CNNs have been usually preferred because deeper NNs have more representational power [6]. Deeper networks get this power from shallower feature representations, which are composed hierarchically, into deeper representations. However, when these networks begin to converge, a degradation problem occurs. Because, when the number of layers are increased, accuracy gets saturated and then degrades rapidly. In other words, adding more layers to a deep NN increases training error. Therefore, training of very deep CNNs is hard due to vanishing gradients in the long forward feed and backward propagate process [7],[8].
Recently, Residual NETwork (ResNET) has been applied in order to overcome vanishing gradient problems in CNNs [9],[10]. A ResNET has shortcut connections parallel to the normal convolutional layers. Those shortcuts act like highways and the gradients can easily flow back. The clearest advantage of ResNETs is their fast training and convergence [9],[11].
Residual blocks and chosen activation functions have a major role in the success of training of NNs. However, their effect in deep NNs is still unclear for image classification. Therefore, in this work, four network models (Table I) have been implemented to analyze the effect of residual blocks and two activation functions, which are Rectified Linear Unit (ReLU) and Scaled Exponential Linear Unit (SELU) [12].
Those network models designed without residual blocks (2nd and 4th model) correspond to plain networks. Therefore, in this study, performance evaluations have been performed for two plain networks and two residual connection based networks.
In this work, to evaluate accuracies of these network models in image classification, they have been applied to solve a challenging issue, which is automated classification of skin diseases from colored digital images. For this purpose, the following common skin diseases have been handled: Acne (Fig. 1.a,b), Rosacea (Fig. 1.c) Hemangioma (Fig. 1.d), Psoriasis (Fig. 1.e), Seborrheic Dermatitis (Fig. 1.f).
All networks have been designed with 18 layers and 100 epochs. The learning rate is 0.00001. Also, ADAptive Moment (ADAM) estimation that is an efficient stochastic optimization has been used [13],[14].
Remaining sections have been organized as follows: In section 2, a short information about ResNET architecture is given for those who are unfamiliar with ResNet. In Section 3, experimental results obtained from the network models are presented in terms of loss, validation loss, accuracy and validation accuracy. In section 4, conclusions are explained.
Background: Resnet Architecture
ResNETs consist of many residual blocks. A residual block is formulated by:
\begin{gather*} {Y_l} = h({X_l}) + F({X_l},{{\text{W}}_1})\tag{1} \\ {X_{l + 1}} = f({Y_1})\tag{2}\end{gather*}
In (1), the input feature of the residual block l is represented by Xl. F refers to a residual function. The term Wl, which is computed by \begin{equation*}{X_{l + 1}} = {X_l} + F({X_l},{W_l})\tag{3}\end{equation*}
Therefore, the next iterative statement can be written as;
\begin{equation*}{X_{l{\text{ + 2}}}}{\text{ = }}{X_{l{\text{ + 1}}}}{\text{ + }}F\left( {{X_{l{\text{ + 1}}}}{\text{,}}{W_{l{\text{ + 1}}}}} \right){\text{ = }}{X_l} + F\left( {{X_l},{W_l}} \right) + F\left( {{X_{l + 1}},{W_{l + 1}}} \right)\tag{4}\end{equation*}
Then, the following equation is obtained;
\begin{equation*}{X_L} = {X_l} + \sum\limits_{i = l}^{L - 1} {F\left( {{X_i},{W_i}} \right)} \tag{4}\end{equation*}
where L refers to a deeper block and l refers to a shallower block. Fig. 2 shows a residual block [10].
A. 1st Model: Network with ReLU, Batch Normalization and Residual Block
This network model has been designed with residual blocks and ReLU. Currently, it is widely-used and the most successful activation function in deep NNs due to its effectiveness and simplicity [20].
The definition of ReLU is given by f(x) = max(x, 0) [21]. If the input to the ReLU activation function is positive then gradients are able to flow. Therefore, deep networks with ReLU activation functions can be optimized more easily compared to the networks with tanh or sigmoid units.
To fix the variance and mean values of layer inputs, batch normalization, which is a non-linear transformation, has been applied for each activation.
B. 2nd Model: Network with ReLU, Batch Normalization, without Residual Block
The 2nd model has been designed without using residual blocks to see their effects. This network model has been constructed with ReLU activation function and batch normalization.
C. 3rd Model: Network with SELU, Residual Block, without Batch Normalization
Residual blocks and SELU activation function, which was used in [22] firstly, has been applied in this model. Similar to ReLU, SELU can overcome the vanishing gradient problem in NNs. Also, in some cases, it can provide better performance compared to ReLU [23]. SELU is formulated by:
\begin{equation*} f\left( x \right) = \lambda \begin{cases} x&{ifx \geq 0} \\ {\alpha {e^\alpha } - \alpha }&{ifx < 0} \end{cases} \tag{5}\end{equation*}
(with α ≈1.7 and λ ≈1 ) and does not have batch normalization layers [20].
D. 4th Model: Network with SELU, without Batch Normalization and Residual Block
The final model has been designed using SELU activation function without batch normalization and residual block.
Experimental Results
These four network models have been applied with 100 colored digital images showing skin diseases. Loss and accuracy values have been obtained from these models to evaluate their image classification performance for five common skin diseases.
In this work, balanced number of images has been used for those five classes. 70% of these total images was selected randomly for training, 15% of the total images was used for validation and the remaining 15% of the total images was used for testing.
Fig. 3 shows loss and validation loss values obtained by the 1st and 2nd model. These models have been designed with ReLU activation function. Differentiation between loss values of these two models shows the effect of the residual block used in the 1st model.
Loss and validation loss scores computed by the 1st model (ReLU, batch normalization and residual block) and 2nd model (ReLU and batch normalization)
Fig. 4 shows loss and validation loss values obtained by the 3rd and 4th model. These models have been designed with SELU activation function. Differentiation between loss values of these two models shows the effect of the residual block used in the 3rd model.
Loss and validation loss scores computed by the 3rd model (SELU and residual block) and 4th model (SELU)
Comparative results of the loss values obtained from the 1st and 3rd model are presented in Fig. 5 to indicate the effect of ReLU and SELU activation functions.
Loss and validation loss scores computed by each model can be seen in Fig. 6.
In addition to these loss values, the results indicating accuracy of the classification have been obtained. Fig. 7 shows the accuracy and validation accuracy values obtained by the 1st and 2nd model to examine the effect of residual block used in the 1st model.
Loss and validation loss values scores computed the 1st model (ReLU, batch normalization and residual block) and 3rd model (SELU and residual block)
Accuracy and validation accuracy values obtained by the 1st model (ReLU, batch normalization and residual block) and 2nd model (ReLU and batch normalization)
The performance in terms of accuracy of the models with residual blocks and SELU activation function (3rd and 4th model) is presented in Fig. 8.
Accuracy and validation accuracy values obtained by the 3rd model (SELU and residual block) and 4th model (SELU)
The results obtained by the accuracy values from the 1st and 3rd model are given in Fig. 9 to indicate the effect of ReLU and SELU activation functions.
Accuracy and validation accuracy values obtained by the 1st model (ReLU, batch normalization, and residual block) and 3rd model (SELU and residual block)
Fig. 10 shows the accuracy and validation accuracy values obtained by each model for comparative evaluations.
The results in terms of validation loss and accuracy for each model are given as a list in Table II. It has been observed that the highest (lowest) validation accuracy is 97.01% (95.57%).
Conclusion
In this work, four network models have been examined to see effects of two activation functions (ReLU and SELU) and residual blocks for image classification. Comparative analyses of these models have been performed with the results obtained by skin disease classification from colored images.
Elimination of vanishing gradient problem and self normalization with the residual blocks provides stability in training of network models (Fig. 7, Fig. 8).
The network models designed using SELU activation function have similar fluctuations according to the accuracy values (Fig. 8).
The network model designed with SELU and without residual block gives the highest validation accuracy. On the other hand, the lowest validation loss is obtained by the network model designed with RELU and residual block (Fig. 6, Fig. 10).
Experimental results show that automated classification of five skin diseases can be performed with high accuracy using deep networks. These models will be tested with increased number and variation of images as an extension of this work.
ACKNOWLEDGMENT
This work has been supported by The Scientific and Technological Research Council of Turkey (TUBITAK-118E777).











